<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" >
<head>
    <link href="decompiler.css" rel="Stylesheet" type="text/css" />
    <title>Decompiler</title>
</head>
<body>
<h1>Welcome to the Decompiler home page!</h1>
<h2>Introduction</h2>
This is the home page of (yet another) open source <dfn>machine code decompiler</dfn> project. The goal of a machine 
code decompiler is to analyze executable files (like .EXE or .DLL files in Windows or ELF files in Unix-like environments) 
and attempt to create a high level representation of the machine code in the executable file:
the decompiler tries to reconstruct the source code from which the executable was compiled in the first place.
 <p>To download the Decompiler, go to the project page:<br />
     <a href="https://sourceforge.net/projects/decompiler">https://sourceforge.net/projects/decompiler</a>
 </p>
<p>
 Since compilation is a non-reversible process  (information such as comments and variable data types is irretrievably lost), decompilation can never completely recover the 
source code of a machine code executable.  However, with some <dfn>oracular</dfn> (read "human") assistance, it can go a long way
towards this goal. An oracle can provide function parameter types, the locations of otherwise unreachable code, and
    user-specified comments.</p>
<p>
The decompiler is designed to be processor- and platform-agnostic. The intent is that you should be able to use
it to decompile executables for any processor architecture and not be tied to a particular instruction set. Although currently only 
a x86 front end is implemented, there is nothing preventing you from implementing a 68K, Sparc, or VAX front end if you 
need one.
</p>
<p>
The decompiler can be run as a command-line tool, in which case it can be fed either with a simple executable file, or a 
<dfn>decompiler project file</dfn>, which not only specifies the executable file to decompile but also any oracular information that assists its work. The decompiler also has a graphical front end, which lets an operator specify oracular information
    while examining the decompiled executable.</p>
    <p>
        The outputs of the decompiler are a C source code file containing all the disassembled
        code and a header file in which type-reconstructed data types can be found.</p>
<h2>Design</h2>
<img style="float:right" src="images/design.png" alt="Diagram of decompiler design" />
The decompiler consists of several phases.
<ul>
<li>The <b>loading phase</b> loads the executable into memory 
and determines what kind of executable is being decompiled. The 
executable format usually defines the processor format and the 
expected operating system environment. For older formats, such 
as plain MS-DOS .EXE files, the processor (x86 real mode) and 
operating system environent (MS-DOS) are implicit. Once the 
format is determined, the binary is loaded into memory 
(uncompressing it if necessary) and pointer or segment 
relocations are carried out. These relocations are also helpful
in later stages of the decompiler, as each relocated pointer value
can be given a preliminary type <strong>pointer-to(&lt;unknown&gt;)</strong> and
each relocated segment selector the type <strong>segment-selector</strong>.
    </li>
    <li>
The <strong>scanning phase</strong> follows the loading phase. The executable
will usually have one or more <dfn>entry points</dfn>, addresses pointing to 
executable code. The code at the the entry points is disassembled and traced, 
looking in particular for <span class="code">branch</span>, <span class="code">call</span>, 
and <span class="code">return</span> statements. Successively, individual <strong>procedures</strong>
 are discovered, and <strong>call graph</strong> is built up, whose edges represent 
 calls between procedures.</li>
 <li>The <b>rewriting phase</b> rewrites all machine-specific instructions into 
 low-level machine-independent instructions. Idiomatic instruction sequences 
 are rewritten to expressions. From this point on, the decompilation  process is processor independent.
</li>
<li>
The <b>analysis phase</b> first does a interprocedural reaching definitions analysis. 
This is done to determine, for each procedure <em>proc</em> of the program, which
    processor registers
are preserved and which processor registers are modified after
    a call to <em>proc.</em> A subsequent interprocedural liveness analysis, combined
    with the results of the reaching definitions analysis, determins which processor
    registers are used as parameters and return value registers for each procedure. Note
    that this analysis avoids depending on a specific processor/platform ABI or calling
    convention. Once the two interprocedural analyses are complete, the procedures can
    be rewritten with their explicit arguments. Subsequent analyses are then performed
    on a procedure-by-procedure basis. Procedures are converted into <strong>SSA Form</strong>,
    condition code flags are eliminated and expressions are simplified. Finally the
    procedures are converted
    out of SSA Form.</li>
    <li>
        The interprocedural <strong>type analysis</strong> phase attempts to recover the
        data types used in the program by analyzing the way in which values are used by
        the program code, incorporating clues obtained from the relocation data
        as well as any "oracular" 
information provided by the user. Memory access expressions
        are converted into their C equivalents: pointer dereferences (<span class="code">*foo</span>), member access
        expressions (<span class="code">foo-&gt;bar</span>), and array references (<span class="code">foo[bar]</span>).
    </li>
    <li>
        Finally, a <strong>structure analysis</strong> rewrites the control structures from
        unstructured goto-sphaghetti code to C-language <span class="code">if</span>, <span class="code">while</span>- / <span class="code">do</span>-loops,
        and <span class="code">switch</span>-statements.</li>
</ul>

<h2>Development</h2>
The decompiler is written in C# and currently targets CLR version 2.0.
It's currently developed with Visual Studio 2005, but the plan
is to have a working MonoDevelop project soon (wanna pitch in?)

<p>The project implements the <dfn>Test Driven Development</dfn> methodology, with heavy emphasis on unit testing.
No new code is allowed into the project unless it has one or more associated tests written for it. Developing a 
decompiler is notoriously tricky work with lots of special cases. Not having unit tests would make development an 
eternal bug hunt as fixes for one bug introduce other bugs. Unit tests are developed using NUnit v2.2.&nbsp;</p>

<p>Subversion is used for source control.</p>

<h2>Status</h2>
    The decompiler project is in a pre-alpha stage. As it stands, it is able to load MS-DOS and PE binary files,
disassemble their contents, rewrite the disassembled instructions into intermediate code, and perform the analysis phase
mentioned above. Currently work is focussed on type analysis, while code structuring is on the back-burner as it's considerably
less complex than type recovery.If you'd like to chip in, feel free to contact us!
<br />
<br />
<div align="right"><a href="http://sourceforge.net/projects/decompiler"><img src="http://sflogo.sourceforge.net/sflogo.php?group_id=204959&type=14" width="150" height="40" border="0" alt="Get Decompiler at SourceForge.net. Fast, secure and Free Open Source software downloads" /></a>
</div>

</body>

</html>
